Automatisk splitting av sammensatte ord-et lingvistisk hjelpemiddel for tekstsøking (Automatic splitting of compound words-A linguistic aid for text search) [In Norwegian]
نویسندگان
چکیده
Sammensatte ord skaper problemer ved ulike former for automatisk analyse av vokabularet i en tekst, f.eks, ved frekvensstudier. Problemet består i at menings innholdet i et sammensatt ord i mange tilfeller også kan beskrives i et uttrykk med de tilsvarende usammen satte ordene. I tekstsøking kan f.eks, de sammensatte ordene føre til at man ikke finner de dokumentene man søker etter fordi det ikke er samsvar i ordbruken mellom søkeargumentet og dokumentene. Hvis man f.eks, bare søker på et sammensatt ord uten å dele det opp i de enkelte ledd, vil man ikke finne de tekstene hvor alle leddene i det sammensatte ordet er nevnt, men løsrevet fra hverandre. P å d e n n e b a k g r u n n e n b l e d e t u t v i k l e t e n m e t o d e f o r a u t o m a t i s k s p l i t t i n g a v s a m m e n s a t t e o r d . M e t o d e n e r b a s e r t p å e t s e t t med c a . 1 0 0 0 r e g l e r o g i k k e e t l e k s i k o n .
منابع مشابه
Splitting of Compound Terms in non-Prototypical Compounding Languages
Compounding is present in a large variety of languages in different proportions. Compound rate in the text obviously depends on the language, but also on the genre and the domain. Scientific and technical texts are especially conducive to compounding, even in the languages that are not traditionally admitted as highly compounding ones. In this article we address compound splitting of specialize...
متن کاملText Segmentation into Paragraphs Based on Local Text Cohesion
The problem of automatic text segmentation is subcategorized into two different problems: thematic segmentation into rather large topically selfcontained sections and splitting into paragraphs, i.e., lexico-grammatical segmentation of lower level. In this paper we consider the latter problem. We propose a method of reasonably splitting text into paragraph based on a text cohesion measure. Speci...
متن کاملIntegrated JIT Lot-Splitting Model with Setup Time Reduction for Different Delivery Policy using PSO Algorithm
This article develops an integrated JIT lot-splitting model for a single supplier and a single buyer. In this model we consider reduction of setup time, and the optimal lot size are obtained due to reduced setup time in the context of joint optimization for both buyer and supplier, under deterministic condition with a single product. Two cases are discussed: Single Delivery (SD) case, and Multi...
متن کاملKey Issues in Vowel Based Splitting of Telugu Bigrams
Splitting of compound Telugu words into its components or root words is one of the important, tedious and yet inaccurate tasks of Natural Language Processing (NLP). Except in few special cases, at least one vowel is necessarily involved in Telugu conjunctions. In the result, vowels are often repeated as they are or are converted into other vowels or consonants. This paper describes issues invol...
متن کاملA Sandhi Splitter for Malayalam
Sandhi splitting is the primary task for computational processing of text in Sanskrit and Dravidian languages. In these languages, words can join together with morpho-phonemic changes at the point of joining. This phenomenon is known as Sandhi. Sandhi splitter splits the string of conjoined words into individual words. Accurate execution of sandhi splitting is crucial for text processing tasks ...
متن کامل